fix: no more multiple finish reasons in stream #4154

ayushag-nv · 2025-11-06T06:58:38Z

Overview:

Problem:

There are multiple finish reasons in the stream response, which is causes by emitting empty chunks or leftover chunks even when the previous chunks have finish reason.
JailStream does not respect other finish_reasons in the stream except the last finish reason

How vllm Handles:

overwrite the last chunk with finish reason == ToolCall if finish reason is stop and tools are created.

Proposed Changes:

separate post processor logic in JailedStream that takes the jail stream output and overwrite the last chunk finish reason with ToolCall if there are any tool calls in choices.
use the original stream finish reason wherever possible
Unit tests are updated to validate one finish reason across the stream and also validating the value.
added tests for edge cases related to finish_reason = length

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved handling of finish_reason signals in streaming chat completions to prevent duplicate emissions and ensure accurate state tracking across multiple choice responses.

Signed-off-by: ayushag <[email protected]>

copy-pr-bot · 2025-11-06T06:58:42Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: ayushag <[email protected]>

coderabbitai · 2025-11-06T07:04:29Z

Walkthrough

These changes implement per-choice tracking of finish_reason emission states in streaming chat completion processing, preventing duplicate emissions across chunks. Additionally, test coverage is enhanced with assertions validating finish_reason values across tool parsing scenarios.

Changes

Cohort / File(s)	Summary
Streaming finish reason state tracking `lib/llm/src/protocols/openai/chat_completions/jail.rs`	Introduces `finish_reason_emitted` map to track per-choice emission state. Adds skip logic to prevent reprocessing choices with emitted finish reasons. Propagates state through various emission paths (responses, tool content, pass-throughs) and final stream end emissions.
Test coverage enhancements `lib/llm/tests/test_streaming_tool_parsers.rs`	Adds `FinishReason` import and validates finish reason values (Stop or ToolCalls) on final chunks. Extends assertions to verify tool_calls data presence alongside finish reasons across no-tool and tool-enabled scenarios.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Jail logic state propagation: Verify finish_reason_emitted state is correctly maintained across all emission paths (responses, tool content, pass-throughs, final emissions).
Double-emission prevention: Ensure the skip check properly prevents reprocessing and no valid chunks are unexpectedly filtered.
Test assertion coverage: Confirm finish reason assertions are consistent with expected behavior across all tool parsing paths.

Poem

🐰 A reason to finish, now tracked with great care,
No duplicate reasons shall float in the air!
Through streams and through choices, state flows with precision,
Each chunk meets its end with a clear, clean decision. ✨

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The PR description is incomplete and missing critical sections required by the template.	Complete the 'Details' and 'Where should the reviewer start?' sections with a comprehensive description of changes and specific files to review. Update or remove the 'Related Issues' placeholder (#xxx).

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely describes the main problem being fixed: preventing multiple finish reasons in stream responses.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

lib/llm/src/protocols/openai/chat_completions/jail.rs (1)
536-552: Avoid cloning the emission Vec before dispatch

tool_content_emissions isn’t reused after this call, so cloning the entire Vec just to satisfy ownership creates extra allocations on the hot path. Move the Vec directly into emit_choice_emissions instead.
-                            let responses = self.emit_choice_emissions(tool_content_emissions.clone(), chat_response, preserved_metadata);
+                            let responses = self.emit_choice_emissions(tool_content_emissions, chat_response, preserved_metadata);

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a629b86 and 2a029f8.

📒 Files selected for processing (2)

lib/llm/src/protocols/openai/chat_completions/jail.rs (7 hunks)
lib/llm/tests/test_streaming_tool_parsers.rs (10 hunks)

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.

📚 Learning: 2025-09-10T15:27:42.511Z

Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/preprocessor.rs:768-844
Timestamp: 2025-09-10T15:27:42.511Z
Learning: In the tool calling jail implementation in lib/llm/src/preprocessor.rs, the design intentionally emits only the first accumulated choice that contains tool calls during unjailing, dropping other accumulated choices. This is a deliberate design decision, not a bug.

Applied to files:

lib/llm/src/protocols/openai/chat_completions/jail.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: The create_choice method exists on multiple different objects in the codebase. The DeltaGenerator::create_choice in lib/llm/src/protocols/openai/chat_completions/delta.rs has its own signature that was updated to include reasoning_content, but other objects in lib/llm/src/engines.rs have their own separate create_choice methods with different signatures that are not related to chat completions.

Applied to files:

lib/llm/src/protocols/openai/chat_completions/jail.rs
lib/llm/tests/test_streaming_tool_parsers.rs

📚 Learning: 2025-08-22T19:55:41.608Z

Learnt from: nachiketb-nvidia
Repo: ai-dynamo/dynamo PR: 2656
File: lib/llm/src/protocols/openai/chat_completions/delta.rs:320-327
Timestamp: 2025-08-22T19:55:41.608Z
Learning: There are two separate DeltaGenerator classes in the codebase: one for chat completions (lib/llm/src/protocols/openai/chat_completions/delta.rs with object "chat.completion.chunk") and one for text completions (lib/llm/src/protocols/openai/completions/delta.rs with object "text_completion"). They have different create_choice method signatures and serve different OpenAI API endpoints. The reasoning parsing functionality is only relevant to the chat completions DeltaGenerator.

Applied to files:

lib/llm/src/protocols/openai/chat_completions/jail.rs
lib/llm/tests/test_streaming_tool_parsers.rs

📚 Learning: 2025-09-11T03:24:47.820Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3004
File: lib/runtime/src/pipeline/network/ingress/push_handler.rs:271-277
Timestamp: 2025-09-11T03:24:47.820Z
Learning: In lib/runtime/src/pipeline/network/ingress/push_handler.rs, the maintainer prefers to keep the existing error comparison logic using format!("{:?}", err) == STREAM_ERR_MSG unchanged until proper error types are implemented, even though it has technical debt. Avoid suggesting changes to working legacy code that will be refactored later.

Applied to files:

lib/llm/tests/test_streaming_tool_parsers.rs

📚 Learning: 2025-09-10T05:04:58.417Z

Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 2932
File: lib/llm/src/protocols/openai/chat_completions/aggregator.rs:66-86
Timestamp: 2025-09-10T05:04:58.417Z
Learning: In the dynamo codebase, tool call chunks from streaming responses always contain complete tool calls (one chunk = one tool call), unlike standard OpenAI streaming where tool calls can be fragmented across multiple chunks. The convert_tool_chunk_to_message_tool_call function correctly assumes complete tool call data in each chunk.

Applied to files:

lib/llm/tests/test_streaming_tool_parsers.rs

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: clippy (.)
GitHub Check: clippy (lib/runtime/examples)
GitHub Check: clippy (lib/bindings/python)
GitHub Check: clippy (launch/dynamo-run)
GitHub Check: Build and Test - dynamo

lib/llm/src/protocols/openai/chat_completions/jail.rs

Signed-off-by: ayushag <[email protected]>

lib/llm/tests/test_jail.rs

lib/llm/src/protocols/openai/chat_completions/jail.rs

Signed-off-by: ayushag <[email protected]>

lib/llm/src/preprocessor.rs

elyasmnvidian

Address comments please

Signed-off-by: ayushag <[email protected]>

ayushag-nv · 2025-11-07T16:24:06Z

/ok to test e814e28

lib/llm/src/protocols/openai/chat_completions/jail.rs

rmccorm4 · 2025-11-07T19:30:57Z

/ok to test 3557327

fix: no more multiple finsih reason in stream

2a029f8

Signed-off-by: ayushag <[email protected]>

ayushag-nv requested a review from a team as a code owner November 6, 2025 06:58

pull-request-size bot added the size/L label Nov 6, 2025

ayushag-nv requested a review from elyasmnvidian November 6, 2025 06:58

github-actions bot added the fix label Nov 6, 2025

ayushag-nv requested a review from rmccorm4 November 6, 2025 06:58

fix: fmt

6f2138e

Signed-off-by: ayushag <[email protected]>

ayushag-nv requested a review from grahamking November 6, 2025 07:04

ayushag-nv self-assigned this Nov 6, 2025

coderabbitai bot reviewed Nov 6, 2025

View reviewed changes

rmccorm4 reviewed Nov 6, 2025

View reviewed changes

lib/llm/src/protocols/openai/chat_completions/jail.rs Outdated Show resolved Hide resolved

ayushag-nv added 2 commits November 6, 2025 18:58

chore: vllm behaviour unleashed

29351af

Signed-off-by: ayushag <[email protected]>

fix: cleanup

2cad55c

Signed-off-by: ayushag <[email protected]>

rmccorm4 reviewed Nov 6, 2025

View reviewed changes

lib/llm/tests/test_jail.rs Outdated Show resolved Hide resolved

rmccorm4 reviewed Nov 6, 2025

View reviewed changes

lib/llm/tests/test_jail.rs Outdated Show resolved Hide resolved

rmccorm4 reviewed Nov 6, 2025

View reviewed changes

lib/llm/src/protocols/openai/chat_completions/jail.rs Show resolved Hide resolved

chore: remove clone

e8a02ab

Signed-off-by: ayushag <[email protected]>

elyasmnvidian reviewed Nov 6, 2025

View reviewed changes

lib/llm/src/preprocessor.rs Outdated Show resolved Hide resolved

elyasmnvidian requested changes Nov 6, 2025

View reviewed changes

ayushag-nv added 3 commits November 6, 2025 20:06

chore: highlevel wrapper method

c214ba2

Signed-off-by: ayushag <[email protected]>

fix: fmt

46cd577

Signed-off-by: ayushag <[email protected]>

fix: edge cases

25013b6

Signed-off-by: ayushag <[email protected]>

elyasmnvidian approved these changes Nov 7, 2025

View reviewed changes

ayushag-nv enabled auto-merge (squash) November 7, 2025 16:23

Merge branch 'main' into ayushag/tool-call-stop-reason

e814e28

Merge branch 'main' into ayushag/tool-call-stop-reason

3557327

rmccorm4 reviewed Nov 7, 2025

View reviewed changes

lib/llm/src/protocols/openai/chat_completions/jail.rs Show resolved Hide resolved

rmccorm4 approved these changes Nov 7, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB November 7, 2025 19:31 Inactive

copy-pr-bot bot temporarily deployed to GITLAB November 7, 2025 19:35 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: no more multiple finish reasons in stream #4154

fix: no more multiple finish reasons in stream #4154

ayushag-nv commented Nov 6, 2025 •

edited

Loading

Uh oh!

copy-pr-bot bot commented Nov 6, 2025

Uh oh!

coderabbitai bot commented Nov 6, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elyasmnvidian left a comment

Uh oh!

ayushag-nv commented Nov 7, 2025

Uh oh!

Uh oh!

rmccorm4 commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: no more multiple finish reasons in stream #4154

Are you sure you want to change the base?

fix: no more multiple finish reasons in stream #4154

Conversation

ayushag-nv commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Release Notes

Uh oh!

copy-pr-bot bot commented Nov 6, 2025

Uh oh!

coderabbitai bot commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

elyasmnvidian left a comment

Choose a reason for hiding this comment

Uh oh!

ayushag-nv commented Nov 7, 2025

Uh oh!

Uh oh!

rmccorm4 commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ayushag-nv commented Nov 6, 2025 •

edited

Loading

coderabbitai bot commented Nov 6, 2025 •

edited

Loading